Goto

Collaborating Authors

 target q-learning


A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle

arXiv.org Machine Learning

Q-learning is one of the most simple yet popular algorithms in the reinforcement learning (RL) community [Sutton and Barto, 2018]. However, Q-learning suffers the divergence issue when (linear) function approximation is applied [Baird, 1995, Tsitsiklis and Van Roy, 1997]. To address this instability issue, a technique called target network is proposed in the famous DQN algorithm [Mnih et al., 2015]. In particular, DQN implements a duplication of the main Q-network (i.e., the so-called target network), which is further used to generate the bootstrap signal for updates. One important feature is that the target network is fixed over intervals. Unlike Q-learning, the learning targets do not change during an interval for DQN. In [Mnih et al., 2015, Table 3], it is reported that the target network contributes a lot to the superior performance of DQN.